Scientific information sources accessible free of charge [email protected] Vrije Universiteit Brussel + Information and Library Science, University of Antwerp Belgium Presented at the follow-up conference organised by.

Download Report

Transcript Scientific information sources accessible free of charge [email protected] Vrije Universiteit Brussel + Information and Library Science, University of Antwerp Belgium Presented at the follow-up conference organised by.

Slide 1

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 2

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 3

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 4

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 5

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 6

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 7

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 8

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 9

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 10

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 11

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 12

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 13

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 14

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 15

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 16

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 17

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 18

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 19

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 20

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 21

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 22

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 23

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 24

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 25

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 26

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 27

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 28

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 29

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 30

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 31

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 32

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 33

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 34

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 35

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 36

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 37

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 38

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 39

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 40

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 41

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 42

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 43

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 44

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 45

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 46

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 47

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 48

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 49

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 50

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 51

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 52

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 53

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 54

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 55

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 56

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 57

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 58

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 59

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 60

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 61

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 62

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 63

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 64

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 65

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 66

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 67

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 68

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 69

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 70

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 71

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 72

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 73

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 74

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 75

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 76

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 77

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 78

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 79

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 80

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 81

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 82

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 83

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 84

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 85

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 86

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 87

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 88

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 89

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 90

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 91

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 92

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 93

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 94

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 95

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 96

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 97

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 98

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 99

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 100

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 101

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 102

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 103

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 104

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 105

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 106

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 107

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 108

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 109

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 110

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 111

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 112

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 113

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 114

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 115

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 116

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 117

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 118

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 119

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 120

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 121

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 122

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 123

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 124

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 125

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 126

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 127

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 128

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 129

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 130

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 131

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 132

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 133

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 134

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 135

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 136

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 137

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 138

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 139

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 140

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 141

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 142

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 143

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 144

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 145

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 146

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 147

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 148

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 149

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 150

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 151

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 152

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 153

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 154

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 155

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 156

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 157

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 158

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 159

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 160

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 161

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 162

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 163

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 164

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 165

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 166

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 167

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 168

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 169

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 170

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 171

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 172

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 173

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 174

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 175

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 176

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 177

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 178

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 179

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 180

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 181

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 182

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 183

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 184

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 185

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 186

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 187

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 188

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 189

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 190

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 191

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 192

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 193

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 194

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 195

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 196

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 197

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 198

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 199

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 200

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 201

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 202

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 203

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 204

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 205

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 206

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 207

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 208

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 209

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 210

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 211

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 212

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 213

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 214

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 215

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 216

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 217

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 218

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 219

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 220

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 221

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 222

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 223

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 224

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 225

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 226

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 227

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 228

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!


Slide 229

1

Scientific information sources
accessible free of charge

[email protected]
Vrije Universiteit Brussel
+ Information and Library Science, University of Antwerp
Belgium
Presented at the follow-up conference
organised by ECOMAMA,
in Oostende, May 2005

Abstract / Summary / Overview:
Besides the more classical commercial, fee-based systems, an increasing number of open access sources and services is available that can be exploited to access
scientific and technical information, knowledge, ideas and so on. This contribution gives an overview of open access secondary sources and services
(subject directories, bibliographic databases, search engines… that point to primary commercial as well as open access sources, and thesaurus systems that
can improve information retrieval).
During recent years these have been identified, evaluated, incorporated in study materials of students
(http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/), integrated in the WWW site of our university library (http://www.vub.ac.be/BIBLIO/) and in the
OpenURL link generator of the university library search system.
The primary open access sources include
-billions of public access WWW pages,
-thousands of discussion groups based on electronic mail, Usenet, the WWW or combinations of these,
-many electronic journals, and
-open archives/repositories set up by scientific organisations.
Pointing out interesting ones is hardly feasible in view of the huge volume. This brings us to the secondary sources that help us to create order in the expanding
information landscape. The secondary open access sources include
-numerous general, horizontal subject directories guiding us to WWW sites,
-more specialized, vertical subject directories guiding us to WWW sites in some specific subject area, such as onefish on fishing by FAO, and oceanportal on
marine science by UNESCO-IOC-IODE,
-numerous general, horizontal search engines that lead us to WWW pages, such as Google, MSN Web Search, Search Yahoo; some offer categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query, such as Mooter, Teoma, and Wisenut,
-several search engines that guide us to open archives/repositories, such as Scirus and more recently Google Scholar,
-many meta-search engines that rely on existing search engines and that combine results; remarkable among these is Vivisimo which offers categorization of
results to cope with the classical problem in information retrieval of ambiguity of meaning of words in a query,
-several general, horizontal search engines / databases that allow us to find/identify articles and other documents in most areas of science and technology, such
as Article@INIST, Infotrieve, IngentaConnect, Scirus, and most recently Google Scholar,
-several vertical, search engines / databases that allow us to find articles and other documents in some specific domain of science, such as Medline for
biomedical science and related areas, and Eric for educational science and for library and information science,
-more specialised search engines to find images on the WWW, such as Google Image Search,
-a directory and database of scholarly open access journals and articles, named DOAJ,
-a search engine to find Usenet newsgroups and articles, Google Groups,
-several huge databases of booksellers that offer bibliographic descriptions of books and related information, such as Amazon, and of course the book
databases/catalogues of many libraries, such as the Library of Congress and the British Library,
-a few databases that allow even full-text searches of the contents of a selection of books, such as Amazon and Google Print.
Some of the systems mentioned above even offer a current awareness service to alert users when a new document has become available that corresponds well
with the user profile stored with the system. For public access WWW pages, Google offers this service.
The other open access systems include general, horizontal thesaurus systems that can guide us in formulating queries to increase precision and recall, such as
the thesaurus incorporated in the Google web search engine, and the attractive online Visual Thesaurus.
A new search engine offers even a view on citations received by scholarly articles (although still limited in comparison with commercial citation databases),
Google Scholar.

2

3

These slides should be available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)

4

• types of information
sources
• dictionaries and
encyclopedias

• Internet subject
directories for browsing
+ Internet indexes for text
searching
+ meta-search systems
• current awareness
systems

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

5

• PRACTICAL SESSION

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

• the invisible web and how
to exploit its contents, even
though it is hidden away
from text search systems
• finding books
• finding journal articles
• open access electronic
journals
• finding images/pictures
and multimedia
• thesaurus systems for
better information
retrieval
• citation searching

6

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

7

• PRACTICAL SESSION
The end

- contents
- summary
- structure
- overview

of this
tutorial /
workshop

8

-Interruptions
-Questions
-Remarks
-Discussions
are welcome

9

****

Open access information
sources and services
Types of online access information systems

10

****

Types of online access information
systems: “free” versus “fee”
Public access information sources
free of charge
= “open access”

Fee-based online information services
(NOT free of charge)

11

****

Primary versus secondary
computer sources / systems / services
• Primary sources /systems /services
directly useful

• Secondary sources /systems /services
»helping to access / use the primary services
»“travel agencies”, “navigation services” ...

12

****

Open access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW

13

****

Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.

****Example

Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/

14

****Example

Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/

15

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages

16

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites

»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/

17

****Example

Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/

18

****Example

Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.

19

20

****

Open access information
sources and services
Internet directories and indexes

21

****

Internet: subject-oriented metainformation offered via WWW
Information about information sources: in the form of
»subject guides = texts with references
»classified subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching
»collections of links or forms to the systems mentioned
above

»multi-threaded search systems = meta-search engines

22

****

Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.

• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!

23

****

Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.

24

Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible 

+ Browsing is possible (formulating a query is not needed).
+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.

25

Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.

26

****

Internet global subject directories:
searching directories with a query
• Many of the Internet directories include an index to
search their contents with a query.
• However, then the assisting classification structure is not
well exploited and the user should be aware of the
problems and difficulties of information retrieval with
natural language queries.
• Furthermore, the possibility to use the system in this way
may be confusing, as these directories are not real fulltext Internet indexes, like those provided by other search
tools.

****Example

Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.

27

***-Example

Internet global subject directories:
Yahoo! and full-text search engines
• The company Yahoo! started and became famous by
offering a WWW global subject directory.
• Afterwards it has offered many other services and has
become one of the mostly used WWW portals.
• Since 2003, Yahoo! also owns 3(!) Internet databases and
search engines that were among the biggest and the most
powerful:
All the Web, AltaVista, Inktomi

28

**--Example

Internet global subject directories:
Britannica
• A hypertext global subject directory can be found at
http://britannica.com/

• Entries are rated.
• Accessible free of charge.
• Combined and integrated with a great encyclopedia.

29

**--Example

Internet global subject directories:
BUBL link
• A hypertext global subject directory to more than 10 000
WWW sites for the higher education community can be
found at
http://bubl.ac.uk/link/
• Accessible free of charge.

30

**--Example

Internet global subject directories:
BUBL for marine biology

31

**--Example

Internet global subject directories:
BUBL for hydrology

32

****Example

Internet global subject directories:
dmoz Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/

• The contents is also used in other systems,
such as Google Directory and Webbrain.
• Accessible free of charge.

33

****Example

Internet global subject directories:
dmoz: screenshot

34

****Example

Internet global subject directories:
Google directory
• A hypertext global subject directory can be found starting
from
http://www.google.com/

• Accessible free of charge.
• Do not confuse this with the famous Google WWW text
search engine.

35

****Example

Internet global subject directories:
Google directory: screenshot

36

***-Example

Internet global subject directories:
Google directory
• Based on the Netscape DMOZ
Open Directory Project.
• In the DMOZ directory the links are only sorted
alphabetically, but Google offers added value:
the ranking of links in the Google version of the
directory is based on link analysis of the WWW by
Google.
Simply stated, the more links received by a site, the
higher it ranks in the directory, hoping that this
ranking is more relevant.

37

**--Example

Internet global subject directories:
Librarians' Index to the Internet
• A hypertext global subject directory can be found at
http://www.lii.org/

• Accessible free of charge.

38

***-Example

Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.

39

**--Example

Internet global subject directories:
Webbrain
• A hypertext subject directory can be found at
http://www.webbrain.com/

• Based on the Netscape DMOZ Open Directory Project.
• Uses more advanced techniques for the visualisation of
the directory contents than DMOZ or Google.
• Accessible free of charge.

40

**--Example

Internet global subject directories:
Webbrain: screenshot

41

42

**--

Internet global subject directories:
lists of directories
• Many Internet global subject directories exist,
but the ideal one is not available.

• Overviews / lists of Internet subject directories exist also.
• Example (accessible free of charge):
»http://searchengineshowdown.com/dir/

43

***-

Internet subject directories:
non-global, more specific systems

a directory limited to
sources in/of a country or region

the
complete
WWW

a global
subject
directory
can lead to

a directory restricted to
a specific subject domain
(“portal”)

***- Examples

Internet subject directories focusing
on a specific subject domain
“Specialised subject directories” or “gateways”
Examples:

• Educational materials in the USA:
»http://www.thegateway.org/
• Marine science and oceanography:
»http://oceanportal.org/
= http://ioc.unesco.org/oceanportal/

44

***- Examples

Internet subject directories focusing
on a specific subject domain
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/

• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:

»http://www.onefish.org/

45

46

****

Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.

47

****

Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software

user interface to a search engine
Internet index search engine

Internet information source
Internet crawler and indexing system

database of Internet files, including an index

48

****

Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database

49

**--

Internet indexes:
A9





http://A9.com
Available since 2005
This system is offered free of charge by Amazon.com.
It offers a hybrid service:
»It retrieves WWW pages through the Google web search
database.
»It allows full text searching in the contents of a selection of
recent books, free of charge.
(So the same results can probably be obtained by using
separately the more familiar Google web search and the
Amazon.com book search system.)

50

**--

Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/

»http://www.av.com/
»http://av.com/
• Mirror site in UK:

»http://uk.altavista.com/
»http://www.altavista.co.uk/

51

**--

Internet indexes:
AltaVista: features
• Allows full text searching of a part of the WWW.
• Offers relevance ranking of search results.
• Offers links to systems to find

»images,
»MP3 sounds, audio (music…)
»video

»news

52

**--

Internet indexes:
evolution of AltaVista as a company
• AltaVista was started by the computer producer
Digital/DEC as one of the first impressive databases of
WWW documents and as a search engines for that
database.
• Digital/DEC became a part of the famous pc producer
Compaq, including AltaVista.
• Afterwards, AltaVista became a separate company.

53

**--

Internet indexes:
AltaVista and Yahoo!
• Since 2003, AltaVista
--as well as the other leading Internet database and search
engine Alltheweb and the Internet database Inktomi-are owned by one U.S. Internet company: Yahoo!

54

**--

Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
http://alltheweb.com/

• The database was one of the biggest, at least until early
2004.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
• Offers also a module to search for pictures/images.
• Offers spelling suggestions in the search interface.

55

**--

Internet indexes:
All the Web and Yahoo!
• Since 2003, All the Web as well as the other leading
Internet database and search engine AltaVista and the
Internet database Inktomi are owned by the same U.S.
Internet company Yahoo!
• Since 2004, it seems that All the Web
»does not build its own unique database of WWW sites
anymore, but that it relies on the same WWW database as
the earlier competitor AltaVista

56

****

Internet indexes:
Google (Part 1)
• http://www.google.com/
• One of the most popular systems in
2001, 2002, 2003, 2004, 2005…

57

****

Internet indexes:
Google (Part 2)
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…

58

****

Internet indexes:
Google (Part 3)
• Also the contents of some databases can be searched.
In other words, not only static WWW pages are harvested
and made searchable.

• Many other search systems on all kinds of WWW sites are
based on Google.

59

****

Internet indexes:
Google (Part 4)
• For retrieval, an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when

»many sites/pages point to it
»“important” sites/pages point to it
• (Google WWW search was limited in the sense that
maximum 10 words could be used in a query, up to 2005.)

60

**--

Internet indexes:
Google refers to a dictionary
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The dictionary can learn the user more about the
meaning of the words used in the query.

**--Example

Internet indexes:
Google refers to a dictionary: display

61

**--Example

Internet indexes:
from Google into a dictionary

62

63

***-

Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
In this way, this system can be used to expand a search
query, so that the query better covers the search concept.

***-Example

Internet indexes:
from Google into a thesaurus

64

65

***-

Internet indexes:
Google can expand a query: how?
• If you want to retrieve more documents, then you can
request Google to include synonyms of one or several of
the words in your query in an automatic way.

• This works since 2003.
• You can do this by putting a tilde
~
in front of the selected word.
• Example of a query:
word1 ~word2 word3 word4

66

***-

Internet indexes:
Google can expand a query: comment
• Of course, this is only a “quick and dirty” method.
The system does not really understand your information
need.
Manual, intellectual expansion of a query should yield
better results.
• This method does NOT work with most or all other
retrieval systems.

67

***-

?? Question ??

What is the default Boolean operator used by Google?
Why is it important to know this?

68

***-

Internet indexes:
Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version has been available since November
2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

69

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way
from the public access WWW
and
from databases of some scholarly publishers that publish
»full-text, primary electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

70

***-

Take a look at
Google Scholar.

71

****

Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»its own big database to search for images/pictures
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news

»Google Scholar to search for more scholarly information
sources
»Google Print to search in the contents of books

72

****

Internet indexes:
Google as aggregator
• Google has become
a great integrator / aggregator
of systems to access information.

73

**--

Internet indexes:
Google as a company
• The important competitors of Google are
»The well-established, classical Yahoo! subject directory
system

»The Yahoo! search engine, new since 2004
»All the Web and AltaVista,
well-established Internet search engines
• These are all owned by the same U.S. company, Yahoo!,
since 2004.

74

**--Example

Internet indexes:
Hotbot
• The search interface can be found at
http://www.hotbot.com/
• You can search the WWW.

• This system uses one of several of the famous, big Internet
indexes/databases that are created by other companies, to
be selected by the user/searcher.
• Allows advanced, full Boolean searching.

75

**--Example

Internet indexes:
Lycos
• The search interface can be found at
http://www.lycos.com/
• Has been based on various indexes/databases of WWW
pages over time, so that performance has also been
variable.

76

***-Example

Internet indexes:
MSN Web Search





Offered free of charge by Microsoft.
You can search for WWW content.
Since 1998.
Famous system,
»because the search interface can be found with the search
functions that have been built into one of the most
widespread Internet browsers,
Microsoft Internet Explorer, and
»because it is offered by http://search.msn.com/

77

***-Example

Internet indexes:
Scirus
• The search interface: http://www.scirus.com/
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system.

• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF.

78

***-Example

Internet indexes:
Scirus features
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes

»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format, only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge
»since 2005: more than 10 million patent descriptions

79

**--Example

Internet indexes:
Scirus: screenshot

80

**--Example

Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made

• The search interface: http://www.teoma.com/

81

**--Example

Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher Blaise Pascal and
to the computer programming language:

82

**--Example

Internet indexes:
Mooter
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.
The clusters are displayed in a diagram.
• The search interface: http://mooter.com/

83

**--Example

Internet indexes:
WiseNut
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made.

• The search interface: http://www.wisenut.com/

**--Example

Internet indexes:
WiseNut: screenshot of the guide

84

85

***-Example

Internet indexes:
Yahoo!
• An Internet search system is offered through
http://www.yahoo.com/
• This is offered besides the well-established, classical
Yahoo! subject directory.
• Before 2004, the search system was provided by an
external company, most recently by Google.
Since 2004, an independent system is offered that is
competing with other similar systems.

86

****

Internet indexes:
coverage


Internet indexes do not cover all static documents on the
WWW.



Most indexes grow and their “size ranking” is variable.



If exhaustive results are desired, then more than one
Internet index search system should be used.

87

**--

Internet indexes:
coverage of each index in 2003


Most indexes grow and their “size ranking” is variable.



The biggest systems in 2003:
»

Google

» AltaVista
»

All the Web (serving also Lycos)

»

Systems based on the INKTOMI database of WWW
pages.

»

(Note: Yahoo! had not yet a unique search system
and relied on Google for text searches, up to 2004)

88

****

Internet indexes:
coverage of each index in 2004-2005


Most indexes grow and their “size ranking” is variable.



The biggest unique systems that rely on their own,
unique Internet index/database in 2004-2005:

» Google
»

Yahoo!

»

(Systems based on the Inktomi / Yahoo! database,
such as All the Web, AltaVista…)

89

**--

Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall

»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/

90

**--

Internet indexes:
non-global, regional systems

the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region

91

**--

Internet indexes:
subject-specific, specialised systems

the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject

92

***-

?? Question ??
Which factors make different Internet search engines
give different results for an identical search,
(at least in most cases)
even though they have access
to the same (all) documents on the Internet?

93

***-

?? Question ??

In spite of the high popularity and the quality
of the Google Internet index search system,
there are still limitations in the search features.
Which limitations?

94

***-

Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query

95

***-

Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism

96

**--

Meta- search systems:
scheme 1
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

97

**--

Meta- search systems:
scheme 2
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

98

**--

Meta-search systems:
terminology / vocabulary / synonyms
“multi-threaded search systems”
=

“multiple search systems”

=

“multi-search systems”

=

“meta-search systems”

=

“intelligent search agents”

=

“federated search systems”

=

“portals” (but this word has also other meanings)

...

99

**--

Meta-search systems: server-based:
scheme
Client
computer
+
WWW
client program

WWW
server
computer

Internet
WWW

WWW
server
computers
with Internet
search
systems

User

In

Out

100

**--

Meta- search systems:
relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1

Internet search system 2
Internet search system
collected database 2

WWW pages

101

**--Examples

Meta-search systems:
server-based systems












http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com = http://dogpile.com/
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.museseek.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com = http://vivisimo.com/

**--Example

Meta-search systems: server-based:
example: Vivisimo

102

**--Example

Meta-search systems: server-based:
example: Vivisimo
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.

• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.

103

**--Example

Meta-search systems: server-based:
example: Vivisimo
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.

104

**--Example

Meta-search systems: server-based:
example: Vivisimo

105

106

**--

Have a look at Vivisimo,
a meta-search engine through the WWW,
that offers automatic
clustering=classification=categorisation=grouping
of search results.

**--Example

Meta-search systems: server-based:
example: Dogpile
• The clustering software of Vivisimo is also used on other
systems.
• Example: http://dogpile.com/

107

**--Example

Meta-search systems: server-based:
example: Kartoo

108

**--Example

Meta-search systems: server-based:
example: Kartoo
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.

109

110

**--

Meta-search systems: client-based:
scheme
Internet
WWW

WWW
server
computers
with Internet
search
systems

User
Client
computer
+
Multi-threaded
Internet search
client program
In

Out

111

**--Examples

Meta-search systems: client-based:
example
Example:
Copernic

http://www.copernic.com

112

**--

Meta-search systems:
advantages
+ Saves time when otherwise more than only 1 Internetbased information source would have to be used one after
the other;
for instance when searching for specific information that
is hard to find in any single source.
In other words: for the same time spent, more sources can
be covered.

+ Only 1 user interface must be learned for many sources.

113

**--

Meta-search systems:
advantages
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.

114

**--

Meta-search systems:
disadvantages
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance the famous Google
Internet index is normally NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.

115

**--

Meta-search systems:
disadvantages
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:

»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...

116

****

Global Internet search tools:
a comparison
Global Internet
directories

Global Internet
indexes

Multi-threaded
search systems

• Only a limited
selection of Internet
sources

• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes

• Browsing
information sources
is easy

• Searching requires
some skills and
knowledge

• Searching requires
some skills and
knowledge

• Good for broad
searches

• Good for specific,
narrow searches

• Good when even 1
index does not yield
information

117

BREAK
Practical work

118

***-

Internet indexes cover only a part of
the Internet: introduction (1)

The “visible” part of Internet

The “deep, hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index like Google, Yahoo!...)

119

***-

Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of
the static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search
engines try to cover, many other, quite different sources
exist, that are also available through the Internet, but
that are not incorporated in those search engines.

120

***-

Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...

Internet
WWW

Databases
and
file archives
accessible through
the Internet

CGI, ASP,...

Static texts in the WWW that can be indexed Information accessible only
when passwords are used
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news

Word
files

PDF
files

***- Example

Databases accessible over the
Internet: examples
• Medline/PubMed offers
bibliographic descriptions
of publications on
medicine, free of charge.

121

122

***-

Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet

»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups

123

***-

Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.

• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)

***-Example

Gateways to Internet databases
accessible free of charge: screenshot

124

**--Example

Gateways to Internet databases
accessible free of charge: example
• The EEVL subject directory on the WWW that is
specialised in engineering
includes EEVL Xtra,
which allows simultaneous searching of open access
databases in the field of engineering.
• Since 2005.

125

126

***-

Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves

***-Example

Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• Available from:
»http://www.askjeeves.com/
»http://www.ask.com/
»http://www.aj.com/

127

128

**--

Guides to searching the Internet
available through WWW
• Searching the Internet:
recommended sites and search techniques. [online]
Available from:
http://www.albany.edu/library/internet/search.html
• The RDN virtual training suite. [online]
Available from:
http://www.vts.rdn.ac.uk/
offers training for users with a specific academic or
professional interest.

129

****

Internet:
who owns the search tools?
In 2004, 2005
• The products of the company Yahoo! include
»the most famous global Internet subject directory

»4 (!) Internet full-text databases / search engines:
All the Web, AltaVista, Inktomi, Yahoo! Search
• The products of the company Google include

»the most famous Internet full-text search engine
»a gateway to old and new Usenet news messages

130

****

Open access information
sources and services
Current awareness services
focusing on WWW pages

131

***-

?? Question ??

How can you easily find
new pages that become accessible on the WWW
about a particular topic that is interesting for you?

132

***-

Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,

»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber

133

***-

Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.

***-Example

Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.

• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2004.
• http://www.googlealert.com/

134

***-Example

Current awareness services focusing
on WWW pages: Google Alert

135

***-Example

Current awareness services focusing
on WWW pages: directly from Google
• Since 2004.
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the popular Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge.
• Available at http://www.google.com/
and then see the page with additional services.

136

137

***-

Apply Google Alert.

138

****

Open access information
sources and services
Public access book databases

139

****

Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.

• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.

140

****

Public access book databases:
an overview
• (Databases by publishers.)
• (Fee-based databases by commercial providers)
• (Databases of computer-based versions of books.)

• Catalogue databases
by book distributors / bookshops!
• Online public access catalogue databases of libraries

• Databases of scanned book pages (since 2004)

141

****

Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.

• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.

142

***-

Suitable book databases?
AIM

RECOMMENDED SYSTEMS

To find book titles
about a specific subject / topic

?

To find book titles
published before 1990

?

To find a book title
through a title search

?

To find the price
of a book

?

To be informed regularly
about new books

?

143

****

Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.

• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).

144

****

Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
http://a9.com
Both systems allow full text searching in the contents
of a selection of recent books, free of charge.
Besides searching in books, a9.com also retrieves
WWW pages through the Google web search
database.

145

****Examples

Book databases accessible free of
charge: examples in U.S.A.
• Barnes and Noble (US):
http://www.bn.com/

146

***-Examples

Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/

• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/

147

***-Examples

Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/

148

149

****

Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/

***-Examples

International public access
dissertation database: example
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.

150

151

***-

Full-text databases
of scanned books
• Some organisations have scanned the contents of
thousands of books to make them searchable through the
Internet.

• Examples, since 2004:
»http://www.amazon.com/ and choose BOOKS;
incorporated in the search engine A9
»http://print.google.com/ learns us that search results that
lead to a book are incorporated in the normal, classical
Google system.

152

**--Examples

Databases of links to the
full text of many books
Databases
(accessible free of charge )
of links to the full text of many books:

• http://digital.library.upenn.edu/books/
• http://wordtheque.com/

**--Examples

Collection of links to
public access book databases
• See for instance Internet directories like Yahoo! that lead
to information about books.

153

154

**--

Current awareness services
for books
• Some systems can alert the user that a new book has been
published when this fits the interest profile of the user.
• Such an interest profile can be stored on the server of the
system in the form of
»keywords, or
»subject categories / subject fields
• Example: http://www.amazon.com

155

****

Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.

156

***-

Online Public Access Catalogues
of the big famous libraries
• For instance:
British Library, Library of Congress (USA)
• Their coverage is good.

• They offer the best subject descriptions.
• Access is free of charge.

• So they form excellent sources to find books about a
particular subject/topic.

157

***-

Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.

• The national libraries are the most reliable source for
bibliographic searching and verification.

158

***-

Online Public Access Catalogues:
union catalogues of libraries
• Some systems offer access to the merged catalogues of
several libraries, so-called ‘union catalogues’.
• Example:
Copac
http://www.copac.ac.uk/
is accessible free of charge.

159

**--

Online Public Access Catalogues:
simultaneous searching
• Some meta-search services allow simultaneous, parallel
searching in one search action over several databases of
libraries and bookdealers.

• The result depends on the availability and functionality of
the target systems.
• + The coverage is very good.
• - Search options are rather limited.

**--Examples

Online Public Access Catalogues:
simultaneous searching: examples
• Infoball
http://www.infoball.de
• Karlsruher Virtueller Katalog
http://www.ubka.uni-karlsruhe.de/kvk.html
• Simultaneous access to catalogues of libraries related to
water, organised by IAMSLIC, using Z39.50

160

**--Example

Online Public Access Catalogues:
simultaneous searching: examples

161

162

***-

Recommended book databases
AIM

RECOMMENDED SYSTEMS

To find book titles about a
specific subject / topic

Library of Congress, British Library,
(Amazon)

To search for book titles
published before 1990

national libraries, Barnes&Noble,
Infoball, Alapage, Abebooks

Book title search
in general

Library of Congress, British Library,
Infoball

To find the price
of a book

Global Books in Print, Infoball,
online bookshops

To be informed regularly about
new books

Amazon, Alapage, Bol

163

***-

General conclusion
concerning book databases
The
one and only, international, complete, ideal,
bibliographic database

does NOT exist,
but the united forces of the different available book
databases should be satisfying.

164

****

Open access information
sources and services
Online access databases about journal articles

165

****

Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications.
(for instance Elsevier, Emerald, Sage)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.

****Example

Online access databases
about journal articles: Ingenta (1)
• Ingenta allows you to search a bibliographic database of
millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.

166

****Example

Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Available from http://www.ingentaconnect.com/
• Ingenta has acquired Uncover in 2000.

167

168

****

Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.

169

****

Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.

170

****

Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher

• The search interface: http://www.scirus.com

171

***-

Online access databases
about journal articles: Google Scholar
• Google Scholar allows us to search for more scholarly
information sources, including journal articles.
• A beta (test) version was available since November 2004.

• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

172

***-

Online access databases
about journal articles: Google Scholar
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers that publish

»full-text, primary, electronic journals
»bibliographic, secondary, abstract databases
(including some databases of the company CSA)

173

***-

Online access databases
about journal articles: DOAJ
• The Directory of Open Access Journals
started in 2003 as a directory/database
of titles of electronic journals that can be accessed by
anyone free of charge.
• http://www.doaj.org/
• More recently, this system allows deeper searching
(down to the level of the titles and even abstracts of
journal articles)
for an increasing number of the journals that are
included in the directory.

174

***-

Directory of Open Access Journals:
screenshot

**--Example

Online access databases
about journal articles: CiteSeer
• CiteSeer allows searching a bibliographic database of
articles and other documents in the fields of information
and computer science

• Searching is free of charge
• Available from
http://citeseer.ist.psu.edu/

175

**--Example

Online access databases
about journal articles: Medline
• Medline/Pubmed produced by the
National Library of Medicine (USA)
allows searching a bibliographic database of articles in
the field of medicine.
• free of charge
• available from many sites, including
»PubMed of the National Library of Medicine (USA)
and
»Ingenta

176

177

***-

Search for titles of journal articles
that are relevant for you,
in a database provided free of charge.

178

***-

Open access information
sources and services
Electronic newsletters and journals

179

***-

Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.

Author / Sender

Editor

Reader / Receiver

180

***-

Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.

181

**--

Electronic newsletters and journals:
Open Archives Initiative
• The free flow of scientific information is hindered by
traditional distribution methods.
• As a reaction and as a partial solution, the Open Archives
Initiative movement tries to develop and to implement
alternative publishing methods that are based on
publishing through computers and the Internet.

183

****

Open access information
sources and services
Finding multimedia files on the Internet

184

****

Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge, to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches...); video

185

****

Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).

**** Examples

Finding images on the Internet:
screen shot of a Google image search

186

****Examples

Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/
• http://gallery.yahoo.com/ !

187

****Examples

Finding images on the Internet:
examples of search engines (2)
• http://images.google.com/ !
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003, 2004).
For each result, not only a thumbnail is offered,
but also directly the origin with the readable URL;
this makes it easier to guess the relevance of the
document.

188

****Examples

Finding images on the Internet:
examples of search engines (3)
• http://multimedia.lycos.com/
• http://www.altavista.com/
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)

189

**--Examples

Finding images on the Internet:
examples of search engines (4)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.

190

**--Examples

Finding images on the Internet:
examples of search engines (5)
• http://www.ditto.com/
• http://www.picsearch.com/
Does NOT directly show the origin of each picture with a
readable URL, together with each thumbnail.

191

**--Examples

Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized/Images/

192

193

****

Use a specialised search engine
to find images
about a particular subject
on the Internet.

**--Example

Finding audio on the Internet:
example of a search engine
• http://www.findsounds.com
• Allows you to find sound files in formats aiff, au, wav.

194

**--Example

Finding audio and video on the
Internet: example of a search engine
• http://www.altavista.com/
(use the special multimedia finder)

195

**--Example

Finding video on the Internet:
example of a search engine
• http://video.search.yahoo.com/ since 2005

196

197

****

Open access information
sources and services
Thesaurus systems
for better information retrieval

198

****

Thesaurus
relations
Term(s) with broader meaning

BT (= Broader Term)
RT (= Related Term)
UF (= Use(d) For)
Other term(s)
Term
Synonym(s)
NT (= Narrower Term)
Term(s) with narrower meaning

199

***-

Thesaurus applications
related to information searching
• For users (!) of a database:
When the database to be searched is NOT produced with
added descriptors (words and terms) that are taken from
a controlled list of words and terms, then the searcher can
use one or several thesaurus systems first, to find more
words and terms and more suitable words and terms;
then the searcher can use these found words and terms to
formulate a query for that database (to increase recall
and precision).

200

***-

Thesaurus systems
that cover all subjects
• General systems
• Universal systems
• Covering all subjects

• Broad and shallow systems
• Horizontal systems

***-Examples

Thesaurus systems
that cover all subjects: examples (2)
• thesaurus systems that can be used free of charge through
the WWW
»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

201

**--Example

General thesaurus system through the
WWW: screenshot sea

202

**--Example

General thesaurus system through the
WWW: screenshot ocean

203

204

***-

Thesaurus systems covering all
subjects: comments
• An ideal, complete thesaurus that covers all subjects does
not exist.

205

****

Open access information
sources and services
Evolution and future trends

206

****

Online access information:
evolution and future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge (= open access)

207

****

Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.

• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.

208

****

Citations in scientific communication
Citation searching with bibliographic databases

209

***-

Information retrieval:
subject searching & citation searching
• Information retrieval is in most cases carried out in the
form of searching for a particular subject/concept by
using (key)words and terms:
subject searching
• An alternative, complementary method is searching for
relevant documents using the system of citations,
starting with a known, identified, relevant, good “seed”
document:
citation searching

210

***-

Information retrieval:
using citations
• Citations = references in publications to earlier work.
• Citations may lead to other, useful, relevant information:
starting from a particular, interesting “seed” document,
»citations in that document may lead you to older relevant
publications (+ snow-ball effect)
»citation indexes allow you to identify more recent
publications which contain citations to that particular
document

211

***-

Information retrieval:
using citations (scheme)
Seed
document

Snowball
citation
searching

Now

Citation
indexing

Past

Future

Time

212

***-

Information retrieval:
citation indexes
• Citation indexes are produced by the company
Thompson I.S.I.,
the Institute for Scientific Information:
»Science Citation Index,
»Social Sciences Citation Index,
»Arts and Humanities Citation Index

• These have been combined in the online accessible
database Web of Science.

213

***-

Information retrieval:
Google Scholar for citations (Part 1)
• Google Scholar allows us to search and find some
scholarly information sources (documents), including
journal articles
as well as sources that cite those documents.
• Beta (test) version available since November 2004.
• The system is accessible starting from the home page of
Google as one of the additional services.
• The online manual explains the system:
http://scholar.google.com/scholar/about.html

214

***-

Information retrieval:
Google Scholar for citations (Part 2)
• The information is harvested in a more or less automatic
way from the public WWW and from databases of some
scholarly publishers,
and is processed to extract citations.

215

***-

Use Google Scholar not only
for subject searching,
but also for citation searching.

216

***-

Information retrieval: when is
citation searching appropriate?
Cited reference searching is
• appropriate mainly when you search journal articles that
contain citations of a particular document or author, for
instance to study the impact in science of a particular idea
or proposal
»in that document or
»of that author
• less appropriate when you search for papers on a more
general subject

217

****

Citations in scientific communication
Evaluating authors and journal articles,
using citations

218

***-

The number of citations received by
an author
The number of citations received by a particular author
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of published
scientific research or ideas or proposals by a particular
author
• can be estimated by a citation search

219

***-

The number of citations received by
a journal article
The number of citations received by a particular
journal article
• can be important as a factor in the evaluation of the
relevance / impact / scientific / importance of a published
scientific research report or an idea or proposal published
in a particular journal article
• can be estimated by a citation search

220

****

Citations in scientific communication
Internet indexes for citation searching

221

***-

Internet indexes for citation
searching: introduction
• Some Internet indexes / search engines allow you to
search for documents / pages / URLs that link to a
particular page, to some URL that you already know
(such as one of the web pages that you have developed or
that you have made available yourself).
• Linking to a URL is similar to citing an information
source.

• Such search systems can be used to analyse web citations.
• Web citations are sometimes named “sitations”,
referring to the term “web site”.

222

***-

Internet indexes for citation
searching: applications
• Citation searching on the WWW or on an intranet can be
used
»to get an idea of the importance, the fame, the impact of a
particular web document, as measured by the number of
links/citations to that page
»to find out who has considered a particular page as
interesting enough to make a link to
»to find comments/criticisms on a particular web document

223

***-

Internet indexes for citation
searching: query syntax
• For details about the required query syntax, query
formulation, see the online manual or help pages of the
search system that you want to use.

• Take care to search for all variants such as
»//web-server-computer.country/website/index.html
»//web-server-computer.country/website/
»//web-server-computer.country/website

***-Examples

Internet indexes for citation
searching: examples of systems
• Google
• Yahoo!
• (All the Web)

• (AltaVista)
• (HotBot)
• (Lycos)

•…

224

225

***-

Use a WWW search engine
for citation searching,
using specialised searching for links.

226

**--

Internet indexes for citation
searching: using normal searching
When we want to find web documents that cite a specific,
particular web document or web site that we already
know, then we can NOT ONLY use an internet index
search engine with specialised searching in hyperlink
fields on web pages,
but we can ALSO search more “normally” for web
documents/pages, that contain words or exact phrases
that occur in the URL of that specific, particular web
document or site.
For example: searching through the WWW for the string
“http://www.computer.com/directory/subdirectory/web-page.html”

227

**--

Use a WWW search engine
for citation searching,
NOT by using specialised searching for links,
but by using more normal searching for words
that occur in the URL of a web document or site
that you know already.

228

**--

Citation searching
using open access systems
• Not only for subject searching,
but also for citation searching,
open access systems are available.

229

Practical work

230

Questions? Suggestions?

Topics for further discussion?
Topics for co-operation?!
Inclusion of scientific information aspects
in research projects/proposals?!